24 research outputs found

    Formalization of Phase Ordering

    Full text link
    Phasers pose an interesting synchronization mechanism that generalizes many collective synchronization patterns seen in parallel programming languages, including barriers, clocks, and point-to-point synchronization using latches or semaphores. This work characterizes scheduling constraints on phaser operations, by relating the execution state of two tasks that operate on the same phaser. We propose a formalization of Habanero phasers, May-Happen-In-Parallel, and Happens-Before relations for phaser operations, and show that these relations conform with the semantics. Our formalization and proofs are fully mechanized using the Coq proof assistant, and are available online.Comment: In Proceedings PLACES 2016, arXiv:1606.0540

    Dynamic Determinacy Race Detection for Task-Parallel Programs with Promises

    Get PDF
    Much of the past work on dynamic data-race and determinacy-race detection algorithms for task parallelism has focused on structured parallelism with fork-join constructs and, more recently, with future constructs. This paper addresses the problem of dynamic detection of data-races and determinacy-races in task-parallel programs with promises, which are more general than fork-join constructs and futures. The motivation for our work is twofold. First, promises have now become a mainstream synchronization construct, with their inclusion in multiple languages, including C++, JavaScript, and Java. Second, past work on dynamic data-race and determinacy-race detection for task-parallel programs does not apply to programs with promises, thereby identifying a vital need for this work. This paper makes multiple contributions. First, we introduce a featherweight programming language that captures the semantics of task-parallel programs with promises and provides a basis for formally defining determinacy using our semantics. This definition subsumes functional determinacy (same output for same input) and structural determinacy (same computation graph for same input). The main theoretical result shows that the absence of data races is sufficient to guarantee determinacy with both properties. We are unaware of any prior work that established this result for task-parallel programs with promises. Next, we introduce a new Dynamic Race Detector for Promises that we call DRDP. DRDP is the first known race detection algorithm that executes a task-parallel program sequentially without requiring the serial-projection property; this is a critical requirement since programs with promises do not satisfy the serial-projection property in general. Finally, the paper includes experimental results obtained from an implementation of DRDP. The results show that, with some important optimizations introduced in our work, the space and time overheads of DRDP are comparable to those of more restrictive race detection algorithms from past work. To the best of our knowledge, DRDP is the first determinacy race detector for task-parallel programs with promises

    Superconductivity suppression of Ba0.5K0.5Fe2-2xM2xAs2 single crystals by substitution of transition-metal (M = Mn, Ru, Co, Ni, Cu, and Zn)

    Full text link
    We investigated the doping effects of magnetic and nonmagnetic impurities on the single-crystalline p-type Ba0.5K0.5Fe2-2xM2xAs2 (M = Mn, Ru, Co, Ni, Cu and Zn) superconductors. The superconductivity indicates robustly against impurity of Ru, while weakly against the impurities of Mn, Co, Ni, Cu, and Zn. However, the present Tc suppression rate of both magnetic and nonmagnetic impurities remains much lower than what was expected for the s\pm-wave model. The temperature dependence of resistivity data is observed an obvious low-T upturn for the crystals doped with high-level impurity, which is due to the occurrence of localization. Thus, the relatively weak Tc suppression effect from Mn, Co, Ni, Cu, and Zn are considered as a result of localization rather than pair-breaking effect in s\pm-wave model.Comment: 8 pages, 9 figures, to be published in Phys. Rev.

    Unified Polyhedral Modeling of Temporal and Spatial Locality

    Get PDF
    Despite decades of work in this area, the construction of effective loop nest optimizers and parallelizers continues to be challenging due to the increasing diversity of both loop-intensive application workloads and complex memory/computation hierarchies in modern processors. The lack of a systematic approach to optimizing locality and parallelism, with a well-founded data locality model, is a major obstacle to the design of optimizing compilers coping with the variety of software and hardware. Acknowledging the conflicting demands on loop nest optimization, we propose a new unified algorithm for optimizing parallelism and locality in loop nests, that is capable of modeling temporal and spatial effects of multiprocessors and accelerators with deep memory hierarchies and multiple levels of parallelism. It orchestrates a collection of parameterizable optimization problems for locality and parallelism objectives over a polyhedral space of semantics-preserving transformations. The overall problem is not convex and is only constrained by semantics preservation. We discuss the rationale for this unified algorithm, and validate it on a collection of representative computational kernels/benchmarks.Malgré les décennies de travail dans ce domaine, la construction de compilateurs capables de paraléliser et optimiser les nids de boucle reste un problème difficile, dans le contexte d’une augmentation de la diversité des applications calculatoires et de la complexité de la hiérarchie de calcul et de stockage des processeurs modernes. L’absence d’une méthode systématique pour optimiser la localité et le parallélisme, fondée sur un modèle de localité des données pertinent, constitue un obstacle majeur pour prendre en charge la variété des besoins en optimisation de boucles issus du logiciel et du matériel. Dans ce contexte, nous proposons un nouvel algorithme unifié pour l’optimisation du parallélisme et de la localité dans les nids de boucles, capable de modéliser les effets temporels et spatiaux des multiprocesseurs et accélérateurs comportant des hiérarchies profondes de parallélisme et de mémoire. Cet algorithme coordonne la résolution d’une collection de problèmes d’optimisation paramètrés, portant sur des objectifs de localité ou et de parallélisme, dans un espace polyédrique de transformations préservant la sémantique du programme. La conception de cet algorithme fait l’objet d’une discussion systématique, ainsi que d’une validation expérimentale sur des noyaux calculatoires et benchmarks représentatifs

    Hierarchical phasers for scalable synchronization and reductions in dynamic parallelism

    No full text
    Major crossroads in computer industry Processor clock speeds are no longer increasing ⇒ Chips with increasing # cores instead Challenge for software enablement on future system

    Language Extensions in Support of Compiler

    No full text
    Abstract. In this paper, we propose an approach to automatic compiler parallelization based on language extensions that is applicable to a broader range of program structures and application domains than in past work. As a complement to ongoing work on high productivity languages for explicit parallelism, the basic idea in this paper is to make sequential languages more amenable to compiler parallelization by adding enforceable declarations and annotations. Specifically, we propose the addition of annotations and declarations related to multidimensional arrays, points, regions, array views, parameter intents, array and object privatization, pure methods, absence of exceptions, and gather/reduce computations. In many cases, these extensions are also motivated by best practices in software engineering, and can also contribute to performance improvements in sequential code. A detailed case study of the Java Grande Forum benchmark suite illustrates the obstacles to compiler parallelization in current object-oriented languages, and shows that the extensions proposed in this paper can be effective in enabling compiler parallelization. The results in this paper motivate future work on building an automatically parallelizing compiler for the language extensions proposed in this paper.

    Parallelizing compilation scheme for reduction of power consumption of chip multiprocessors

    No full text
    Abstract. With the advance of semiconductor technology, chip multiprocessor architectures, or multi core processor architectures have attracted much attention to achieve low power consumption, high effective performance, good cost performance and short hardware/software development period. To this end, parallelizing compilers for chip multiprocessors are expected that allow us to parallelize program effectively and to control the voltage and clock frequency of processors and storages carefully inside an application program. This paper proposes parallelizing compilation scheme with power reduction control under the multigrain parallel processing environment that controls Voltage/Frequency and power supply of each processor core on a chip. In the evaluation, the OSCAR compiler with the proposed scheme achieves 60.7 percent energy reduction for SPEC CFP95 applu without performance degradation on 4 processors, and 85.6 percent energy reduction for SPEC CFP95 tomcatv with real-time deadline constraint on 4 processors, and 86.7 percent energy reduction for SPEC CFP95 swim with the deadline constraint on 4 processors.
    corecore